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Abstract 

We provide a unification of many statistical methods for traditional small data sets 
and emerging big data sets by viewing them as modeling a sample of size n of variables 
(Xl, . . . , X p , Yi, . . . , Y q ); a variable can be discrete or continuous. The case p = q = 1 is 
considered first, because a major tool in the study of dependence is finding pairs of variables 
which are most dependent. Classification problem: Y is — 1. 

For each variable X we construct orthonormal score functions Tj(x;X), x observable 
value of X. They are functions of F mid (x;X) = F(x;X) — .5p(x;X); approximately 
Tj(x;X) = Leu,- (F mid (i; X)^j ; Lenj(u) orthonormal Legendre polynomials on < u < 
1. Define quantile function Q(u;X), score function Sj(u;X) = Tj{Q(u;X);X}. Define 
score data vectors Sc(X) = ( T X {X- X), . . . , T m (X; X) ) , Sco(X) = (X -E[X], Sc{X)), m 
can vary with X. Define LP comoment matrix LP(X,Y), with entries LP(j, k; X, Y), 
to be covariance matrix of Sc(X) and Sc(Y). Dependence is identified by estimating 
LPINFOR(X, y), a dependence measure estimated by sum of squares of largest LP co- 
moments (could use also multivariate algorithms to measure dependence). 

We seek to also "look at the data" by estimating dependence dep(x, y; X, Y); copula 
density cop(u,v; X,Y); comparison probability ComPrfY = y\X = x\\ comparison density 
d(u; G, F) of distributions F and G, which enables marginal density estimator f(x) = 
g(x)d(G(x)); conditional comparison density d(v;Y.Y\X = Q(u\X)). Bayes theorem can 
be stated 

d(v; Y, Y | X = Q(u; X)) = d(u; X, X \ Y = Q(v, Y)) = cop(u, v; X, Y). 
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We form orthogonal series estimators of copula density, marginal probability, conditional 
expectations E[Y|X], E[X)u(Y; Y) | X] by linear combinations of score functions selected by 
magnitude of LP comoments. We give novel representations of Var(X), COV(X) as linear 
combination of LP comoments; when computed from data they provide diagnostics of tail 
behavior and non-normal type dependence of (X, Y). We represent LPINFOR(X, Y) in 
terms of conditional information LPINFOR (Y|X = Q{u; X)^j. 



Keywords and phrases: Copula density, Conditional comparison density, LP co-moment, LPIN- 
FOR, Mid-distribution function, Orthonormal score function, Nonlinear dependence, Gini cor- 
relation, Extended multiple correlation, Quantile function, Parametric modeling, Algorithmic 
modeling, Nonparametric Quantile based information theoretic modeling, Translational research. 
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1 UNITED STATISTICAL SCIENCE, MANY CULTURES 

Breiman (2001) proposed to statisticians awareness of two cultures: 

1. Parametric modeling culture, pioneered by R. A. Fisher and Jerzy Neyman; 

2. Algorithmic predictive culture, pioneered by machine learning research. 

Parzen (2001), as a part of discussing Breiman (2001), proposed that researchers be aware of 
many cultures, including the focus of our research: 

3. Nonparametric , quantile based, information theoretic modeling. 

Our research seeks to unify statistical problem solving in terms of comparison density, copula 
density, measure of dependence, correlation, information, new measures (called LP score como- 
ments) that apply to long tailed distributions with out finite second order moments. A very 
important goal is to unify methods for discrete and continuous random variables. We are ac- 
tively developing these ideas, which have a history of many decades, since Parzen (1979, 1983) 
and Eubank et al. (1987). Our research extends these methods to modern high dimensional data 
modeling. 

The methods we discuss have an enormous literature. Our work states many new theorems. The 
goal of this paper is to describe new methods which are highly applicable towards the culture of 

4. Vigorous theory and methods for Translational Research, 

which differs from routine Applied Statistics because it adapts general methods to specific prob- 
lems posed by collaboration with scientists whose research problem involves probability mod- 
eling of nonlinear relationships, dependence, classification. Our motivation is: (A) Elegance, 
that comes from unifying methods that are not "black box computer intensive" but "look at 
the data"; (B) Utility, that comes from being applicable and quickly computable for traditional 
small sets and modern big data. 

2 (X,Y) MODELING, COPULA DENSITY 
2.1 ALGORITHMIC (X,Y) MODELING 

Step I. Plot sample quantile functions of X and Y. (Exploratory Data Analysis) 
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Step 11(a) Draw scatter plots (X, Y), (U, V), (U, Y). Also plot nonparametric regression K(Y | 
X = x), E(y | X = Q(u;X)), estimated by series of Legendre polynomials and score 
functions constructed for each variable. 

Step 11(b) For (X discrete, Y discrete): Table dep(x, y\ X, Y), Pr(X = a:), Pr(F = y), 
Corr(X — x,Y — y). 

A fundamental data analysis problem is to identify, estimate , and test models for (X, Y) where 
X and Y are discrete or continuous random variables, We propose to model separately: 

A. Univariate marginal distributions, quantile Q(u; X), Q(v; Y), mid distributions F mid (x; X) = 
F{x; X) - .5p{ X] X), F mid (y; Y) = F(y; Y) - .hp{y; Y)\ 

B. Dependence of (X, Y); our new approach is to model the dependence of (U, V) = (F mid (x; X), 
F mid (y; Y))) U is estimated in a sample of size n by U = F mid (X; X) = (Rank(X) - .5) /n. 

2.2 COPULA DENSITY 

A general measure of dependence is the "copula density" cop(w, v ; X, Y), < u, v < 1. It is 
usually defined for X and Y that are both continuous with joint probability density f(x, y; X, Y). 
Define first the "normed joint density", pioneered in Hoeffding (1940), defined as the joint density 
divided by product of the marginal densities, which we denote "dep" to emphasize that it is a 
measure of dependence and independence : 

dep(x, y- X, Y) = f(x, y; X, Y)/f(x; X)f(y- Y). (2.1) 
The relation of dependence to correlation is illustrated by following formula for X, Y discrete: 

dep(x, y\ X, Y) = Pr(X = x, Y = y)j Pr(A = x) Pr(F = y) (2.2) 
Corr(X = x, Y = y) = ^odds[Pr(X = xj] odds[Pr(F = y)] ( dep(x, y; X, Y ) - l) . (2.3) 

Fig. 1 illustrates these concepts for 2x2 contingency table. 
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Figure 1: 2 x 2 contingency example (Aspirin X, Male Heart Attack Y). | CorrfX = x, Y — 
y) \ = .04, significance depends on n (20000 in famous experiment). 
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Our approach interprets the values of X and Y by their percentiles u and i>, satisfying x = 
Q(u;X),y = Q(v;Y). 



Definition 2.1 (Copula Density). Copula density function of (X, Y) either both discrete or both 
continuous 

cop(w, v; X, Y) = dep (Q{u; X), Q{v; Y)) . (2.4) 
Definition of copula density when X is continuous and Y is discrete is given in Section 8. 

Theorem 2.2. When X and Y are jointly continuous, Copula density function is the joint den- 
sity of rank transform variables U = F(X;X), V = F(Y;Y) with joint distribution function 
F(u,v;U,V) = F(Q(u;X),Q(v;Y);X,Y) , denoted by Cop(w,t>;X, Y) and called Copula (con- 
nection) function, pioneered in 1958 by Sklar (Schweizer and Sklar, 1958, Sklar, 1996). The 
copula density function of (X, Y) and (U, V) are equal ! 

A major problem in applying and estimating copula densities is that the marginal of X and Y 
are unknown. Our innovation is to use the mid-distribution function of the sample marginal 
distribution functions of X and Y to transform observed (X, Y) to (U, V) defining 

U = F mid (X;X), and V = F mid (Y;Y). (2.5) 

As raw fully nonparametric estimator, we propose the copula density function cop(u, v; U, V) 
of the discrete random variables U, V. We define below the concept of comparison probability 
Pr(y — y\ X — x) and conditional comparison density d(v ; Y, Y \ X = Q(u; X)), a special case 
of comparison density d(u; G, F) of two univariate distributions F and G. 

Example 2.3 (Geyser Yellowstone Data). X = Eruption length, Y = Waiting time to next 
eruption. 

3 SCORE FUNCTIONS 

3.1 ALGORITHMIC MODELING 

Step III. Plot score functions Sj(u; X), < u < 1, and Sk(v] Y), < v < 1, for j, k — 1, . . . , 4. 
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Our goal is to nonparametrically estimate copula density function cop(u, v; X, Y), conditional 

comparison density function, conditional regression quantile E [g(Y) \ X = Q(u; X)] , conditional 

quantiles Q(u] Y\X = Q{u\X)^. Our approach is orthogonal series population representation 

and sample estimators that are based on orthogonal score functions Sj(u;X),0 < u < 1 and 

S k (v; Y), < v < 1, that obey orthonormality conditions: 
11 i 

J Sj{u\ X) du = 0, J \Sjiu; X)\ 2 du = 1, and J S h {u\ X)S h {u\ X) du = 0, for 3l ^ j 2 . 



When X is discrete (which is always true when we describe X by its sample distribution) we 
construct Sj(u;X) from score function Tj(x;X) by relations 

Sj(u; X) = Tj(Q(u; X);X), and S k (v; Y) = T k (Q(v; Y);Y). (3.1) 

We construct score functions Tj(x; X) to satisfy for j 1 ^ j 2 

E[T,(X;X)] = 0,E[|T,(X;X)| 2 ] = 1, E[T n (X; X)T n (X; X)] =0. (3.2) 

When X is continuous we construct Sj(u;X) to be orthonormal shifted Legendre polynomials 
on unit interval; we could alternatively use Hermite polynomials, or cosine and since functions. 
When X is discrete, our definition of score functions can be regarded as discrete Legendre polyno- 
mials, and is based on the mid-rank transformation F mid (X; X) which has mean E[F mid (X; X)] = 
.5, variance 

|a mid | 2 = Var[F mid (X;X)] = (1/12) (l - E[\p(X; X)\ 2 ]) . (3.3) 

Definition 3.1 (Score Functions). T at x observable (positive probability) 

T 1 (x;X)= (F mid (x;X)-.5)/a mid 

Construct Tj(x; X) by Gram Schmidt orthonormalization of powers of Ti(x; X). Score functions 
Sj(u;X) = Tj(Q(u,X);X) are piecewise constant on < u < 1; they have shapes similar to 
Legendre polynomials. 

Example 3.2. For X taking values or 1, Pr(X = 1) = p, Pr(X = 0) = q = 1 -p, F mid (0; X) = 
.5q, F mid (l;X) = l-.5p, Var[F mid (X; X)] = (1/12) (l -p 3 - q 3 ) = (l/4)pq, E[F mid (X;X)] = .5. 
Conclude that for X binary, 

71(0; X) = -v/pAz, T^l-X) = ^q~Jp. 
8 



4 LP SCORE CO-MOMENTS LP(j, k; X, Y), COPULA DENSITY, 

ORTHOGONAL SERIES COEFFICIENTS 
4.1 ALGORITHMIC MODELING 

Step IV. Compute and display matrix of score comoments LP(j, k; X, Y) for j, k — 0, 1, ... , 4. 

Step V. Compute L 2 estimator of copula density using smallest number of influential product score 
functions determined by a model selection criterion, which balances model error (bias of 
a model with few coefficients) and estimation error (variance that increases as we increase 
the number of coefficients (statistical parameters) in the model). 

Display LPINFOR(X, Y) = J2 jlk I LP U k; X, Y)\ 2 for m selected indices j, k; under inde- 
pendence n LPINFOR(X, Y) is Chi-square distributed with m degrees of freedom, data 
driven chi-square test; for X discrete, Y discrete. For 2x2 contingency table 

LPINFOR(X,F) = |LP(1,1;X,F)| 2 = | Corr(X = x, Y = y)\ 2 . (4.1) 

Plot c(u,v;X, Y) as a function of iu,v) and also one dimensional graphs c(u,v;X, Y), 
< v < 1 for selected u = .1, .25, .5, .75, .9. 

Definition 4.1 (LP Co-moments). For j, k > , 

LP(j, k- X, Y) = E [Tj(X; X)T k (Y; Y)] . 

Note that many traditional nonparametric statistics (Spearman rank correlation, Wilcoxon two 
sample rank sum statistics) are equivalent to LP(1, 1; X, Y). 

Theorem 4.2. LP comoments are coefficients 9 L2 (j, k; X,Y) = LP(j,k;X,Y) of "naive" L 2 
representations (estimators) of copula density as finite or infinite series of product score functions 
(when rigor is sought, assume that copula density is square integrable) 

cop(u,v-X,Y) - 1 = Y t 6 La {j,k;X,Y)S j {u;X)S k (v;Y). (4.2) 
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4.2 AIC MODEL SELECTION 

For estimation of copula density we identify influential product score function by rank ordering 
squared LP score comoments , use criterion AIC sequence of sums of squared LP m comoments 
minus 2m /n, n is the sample size. Choose m product score functions, where m maximizes AIC 

4.3 LPINFOR 

An information theoretic measure of dependence is LPINFOR, estimated by sum of squares of 
LP comoments of influential product score functions determined by AIC 

4.4 MAXENT ESTIMATION OF COPULA DENSITY FUNCTION 

"Exact" maximum entropy (exponential model) representation of copula density function models 
log copula density as a linear combination of product score functions. The MaxEnt coefficients 
are computed by moment-matching estimating equations 

E[S j (u;X)S k (v;Y) | 6 M e] = LP[j, k; X, Y\. (4.3) 

5 LP SCORE MOMENTS, ZERO ORDER COMOMENTS 
5.1 ALGORITHMIC MODELING 

Step VI. Display LP score moments of X and Y as matrices LP(j, k; X, X) and LP(j, k; Y, Y). 

Definition 5.1 (Score Comoments). Alternatives to moments of a random variable X, are its 
score moments defined 

i 

LP(j;X) = LP(0,j;X,X) = E[XTj(X;X)] = J Q(u; X)S j (u; X) du. (5.1) 

o 

Theorem 5.2. Interpret LP score moments as coefficients of an orthogonal representation of 
the quantile function 

Q(u; X) - E(X) = LP(j; X)S J (u; X), (5.2) 
i>o 

which leads to a very useful fact about variance of X 

Var(X) = ^|LP(j;X)| 2 (5.3) 

j>0 
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Definition 5.3 (LP Tail Order). LP tail order of X is defined to be smallest integer m satisfying 

m 

^|LP(j;X)| 2 /Var(X) > .95 (5.4) 

i=i 

One can show that m = 1 for Uniform , Var(X) = | LP(1; X)| 2 ; therefore tail order m = 1, and 
all higher LP moments are zero. For X Normal, tail order m = 1 since 

|LP(l;X)| 2 /Var(X) = 3/vr = .955. (5.5) 

5.2 L MOMENTS AND GINI COEFFICIENT 

When X is continuous, and score functions are Legendre polynomials, our LP score moments are 
extensions of the concept of L moments extensively developed and applied by Hosking and Wallis 
(1997) Our LP(1;X) is a modification of Gini mean difference coefficient, which is a measure of 
scale. Measures of skewness and kurtosis are LP(2; X) and LP(3; X). 

6 ZERO ORDER LP SCORE COMOMENTS, NONPARAMETRIC 
REGRESSION 

We extend the concept of comoments pioneered by Serfling and Xiao (2007) to define 

LP(j,0;X,Y) =E[T j (X;X)Y], and LP(0,£;;X,Y) = E[XT k (Y;Y)} (6.1) 

Theorem 6.1. Nonparametric nonlinear regression is equivalent to conditional expectation E(Y | 
X); it satisfies LP (j,0;X,Y) = E[3}(X; X)E(Y | X)] . Therefore 

E[Y | X = Q(u;X)] -E[Y] = ^ Sj(u; X) LP(j, 0; X, Y). (6.2) 

5 

We apply this formula to obtain "naive" estimators of conditional regression quantile E(Y | X = 
Q(u; X)), to be plotted on scatter plots of (X, Y). 

6.1 EXTENDED MULTIPLE CORRELATION 

A nonlinear multiple correlation coefficients R^p is defined as 

Rl-P = Var (E[Y | X])/ Var(F) = ^ | LP(j, 0; X, Y) | 2 / Var(Y). (6.3) 

j>0 
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6.2 GINI CORRELATION 

Defined by Schechtman and Yitzhaki (1987), it can be computed in our notation as, 

R G mi(Y | X) = LP(1, 0; X, Y)/ LP(1, 0; Y, Y) = E[7\(X; X)Y]/E[T 1 (Y; Y)Y\. (6.4) 

Similarly define R Gm i{X \ Y) = E[T 1 (Y; Y)X}/E[T 1 (X; X)X}. The square oiR Gm i(Y\X) should 
be compared with our R\p (Eq. 6.3). 

6.3 PEARSON CORRELATION 

R(X, Y) = Corr(X, Y) can be displayed in our LP matrix by defining 

LP(0, 0; X, Y) = R(X, Y)a x a Y = COV(X, Y). (6.5) 
New measurs of correlation: significant terms in representation of Pearson correlation 

R(X, Y) = J2 LP(i, 0; X, X) LP(j, 0; X, Y)/a x a Y . (6.6) 

j>0 

7 BAYES THEOREM 

United statistical science aims to unify methods for continuous and discrete random variables. 
For Y discrete , X continuous Bayes theorem can be stated 

Pr[F = y\X = x]/ Pr[F = y] = f(x; X \ Y — y)/f(x; X). (7.1) 

A proof follows from showing that 

Pr[F = y | X = x]f{x; X) = f(x; X\Y = y) Pr(Y = y). (7.2) 

This equation can be interpreted as a formula for the joint probability of (X.Y). It can be 
rewritten in two ways as a product of a conditional probability and unconditional probability. 
The normed joint density , which divides the joint probability by product of marginal probabilities 
has two formulas, whose equality is the statement of Bayes Theorem. 

At the heart of our approach is to express x and y by their percentiles u and v satisfying 
x = Q(u; X), y = Q(v; Y). We write Bayes theorem of X continuous and Y discrete 

Pr [Y = Q{v; Y) \ X = Q(u; X)] = f(Q(u; X); X | Y = Q(v; Y))/f(Q(u; X); X) (7.3) 
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Definition 7.1 (Copula Density X Discrete and Y Continuous). In terms of concept of com- 
parison density d(u;G,F), defined below , Bayes Theorem can be stated as a equality of two 
comparison densities whose value is defined to be copula density: 

d(v; Y | X = Q(u; X)) = d(u; X \ Y = Q(v; Y)) = cop(w, v; X, Y). (7.4) 

7.1 ODDS VERSION OF BAYES THEOREM 

When Y is binary — 1 we express and apply Bayes theorem in terms of odds of a probability 
defined odds(p) = p/(l — p). 

Pr[Y = l\ X = x] = Pr[Y = l]f(x;X\Y = l) 

Pi[Y = | X = x] Pr[Y = 0]/(x; X \ Y = 0) ' 1 ' ' 

For logistic regression approach to estimating Comparison density 

d(u) = d(u;X, X\Y = 1) = f(Q(u;X);X | Y = l) / f (Q(u; X); X) , (7.6) 

define p(u) = Pt[Y = l]d(u). One can then express Bayes Theorem for odds 

odds [Pr(Y = 1 | X = Q(u;X))] =p(u)/(l -p(u)) = odds(p(«)). (7.7) 

7.2 LOGISTIC REGRESSION ESTIMATION OF COMPARISON DENSITY 

If one models log odds [Pr(Y = 1 | X — Q(u;X))], equivalently logodds(p(u)), as a linear 
combination of score functions Sj(u;X), the coefficients (parameters) can be quickly computed 
(estimated) by logistic regression. 

7.3 OTHER METHODS OF COMPARISON DENSITY ESTIMATION 

There are many approaches to forming an estimator d(u) of two sample comparison density 
d(u), including : L 2 orthogonal series, Maximum entropy (MaxEnt) exponential model, kernel 
smoothing of raw estimator d. 

Theorem 7.2 (Asymptotic variance of kernel comparison density estimator). Parzen (1983, 
1999) demonstrated that kernel comparison density estimator p(u) = Pr[Y = l]d(u) has asymp- 
totic variance for large sample size n 

Var[p(«)] = p(u)(l-p(u))M/n, (7.8) 

where M is a measure of equivalent number of parameters defining the estimator. 
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7.4 ASYMPTOTIC VARIANCE KERNEL RELATIVE DENSITY ESTIMATOR 

In two sample problem distinguish comparison density d(u; H, G) and relative density d(u; F, G); 
G denotes distribution of X in sample 1 (Y = 1), F is distribution of X in sample 2 (Y = 
2), if is distribution of X in pooled (combined) sample. Study relative density (also known 
as grade density) by defining prel(w) = (Pr[Y = l]/Pr[Y = 2})d(u; F, G). One can argue 
(Parzen, 1999) that kernel density estimator of prel(w) has variance approximately proportional 
to prel(w) + prel(it) 2 . 

8 COMPARISON DENSITY, COMPARISON PROBABILITY 

For (X, Y) discrete or continuous define comparison probability 

ComPr[F = y | X = x] = Pi[Y = y \ X = x]/Pi[Y = y], Y discrete (8.1) 

= f(y; Y | X — x)/ f(y; Y), Y continuous. 

Define comparison density as functions of u,v on unit interval satisfying x = Q(u;X), y = 
Q(v;Y): 

d(u; X,X \Y = Q(v,Y)) = ComPr[X = Q(u; X) \ Y = Q(v; Y)} (8.2) 
d(v; Y,Y | X = Q(u,X)) = ComPr[F = Q(v; Y) \ X = Q(u; X)}. (8.3) 

9 UNIVARIATE DENSITY ESTIMATION BY COMPARISON DEN- 
SITY ORTHOGONAL SERIES 

9.1 ALGORITHMIC MODELING 

Step VII. Estimate marginal probability density of X and Y by estimating comparison density func- 
tion of true distribution with an initial parametric model for the distribution. 

Let X be a continuous random variable whose probability density f(x; X). we seek to estimate 
from a random sample X±, . . . ,X n . The comparison density approach chooses a distribution 
function G(x) whose density function g(x) satisfies f(x; X)/g(x) is a bounded function of X. 
We call G a parametric start whose goodness of fit to the true distribution of X is tested by 
estimating the comparison density. Let Qg(^) denote quantile function of G. Define comparison 
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distribution 

D(u;G,F(-;X)) = F(Q G (-);X); (9.1) 

comparison density 

d(u) = d(u; G, F(.;X)) = f(Q G (u); X)/g(Q G (u)). (9.2) 

An estimator d(u) yields an estimator 

f(x;X)=g(x)d(G(x)). (9.3) 

We interpret comparison density as probability density of U — G(X), called rank-G transforma- 
tion. 

9.2 NEYMAN DENSITY ESTIMATOR 

A nonparametric estimator of d(u), pioneered by Neyman (1937) research on smooth goodness 
of fit tests, can be represented 

d(u) = l + J2 e hS h (u) (9.4) 

h 

where score functions Sh{u) are orthonormal shifted Legendre polynomials on unit interval, and 
e h = E[S h (U)] = (l/n)J2S h [G(X J )} = LE[h;G(X)}. (9.5) 

3 

Note hP(h;X) = K[XSh(F mid (X; A))] provide diagnostics of scale, skewness, kurtosis, tails of 
distribution of X. Complete definition of Neyman orthogonal series comparison density estima- 
tor by selecting indices h by AIC based on sums of squares of ranked values of LE[/i; G(X)]. 
Maximum index h is usually 4 for a unimodal distribution, and 8 for a bimodal distribution. 

10 CONDITIONAL LP SCORE MOMENTS, CONDITIONAL LPIN- 

FOR REPRESENTATION 

To identify and model dependence of (A, Y) omnibus measures are integrals over < u, v < 1 
of logarithm and square of copula density cop(u, v; X, Y); LPINFOR(A, Y) estimates integral 
of square of copula density. For greater insight we should compute directional measures of 
dependence, such as extended multiple correlation Rj jP (Y\X), and concepts introduced in this 
section: conditional LP score moments LP(k;Y | A = Q(u;X)); conditional LPINFOR denoted 
LPINFOR(F | A = Q(u; A)). 
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Definition 10.1. 



LPINFOR[Y|X = Q(u;X)} = J \d(v; Y, Y \ X = Q(u; X)) - l| 2 dv, (10.1) 

o 

i 

LP[k;Y\X = Q(u;X)] = J S k (v; Y)d(v; Y, Y \ X = Q(u; X)) dv (10.2) 



o 



= E[T k (Y;Y) \X = Q(u;X)} = Sj(u; X) LP(j, k; X, Y). 

j>0 

Theorem 10.2 (Conditional LPINFOR Representation of LPINFOR). 

i 

LPINFOR(X, Y) = I LPINFOR(F | X = Q(u; X)) du = \ LP (j> ^ x , Y ) T- ( 10 - 3 ) 

o i. fe >o 

LPINFOR(F | X = Q(u;X)) = | LP(A;;F|X = Q{u;X))\ 2 . (10.4) 

fc>0 

Use variable selection criteria to choose indices in representation for LPINFOR(F|X = Q(u; X)) 
to estimate it. Plot LPINFOR(F|X = Q(u;X)) on < u < 1 to help interpretation of 
LPINFOR(X,F). 

A more convenient way to compute LP(/c; Y\X = Q(u;X)) when X is discrete: 

Corr(T fe (F; Y) , I(X = x)) = v / odds(Pr[X = x}) LP (A;; Y \ X — x). (10.5) 

Generalizes formula for 2 by 2 contingency table of variables X, Y = or 1 

lET^X-X^Y-Y^ = \Con(X = x,Y = y)\ = | LP(1, 1; X, Y)\ (10.6) 

These concepts can be applied to traditional statistical problems: 

X continuous, Y continuous Regression (linear and non-linear) E(Y | X),K(Y \ F mid (X)). 

X binary, Y continuous Two sample E(Y\X = 1), E(F mid (Y)|X 

X discrete, Y continuous Multi-sample (analysis of variance) E(Y|X = j), E(F mid (Y)|X 

X continuous, Y binary Logistic regression E[I(Y = 1) | X]. 

X continuous, Y discrete Multiple logistic regression E[I(Y = j) \ X]. 

X binary , Y binary 2x2 Contingency table E[I(Y = 1) | X — x]. 

X discrete, Y discrete r by c Contingency table E[I(Y — y) \ X — x]. 
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When X and Y are vectors, a measure of their dependence is coherence, defined as trace of 

COH(X,Y) = K XX K XY K YY K YX (10.7) 
Our measure LPINFOR(X, Y) can be regarded as a coherence. 

11 HIGHLIGHTS OF ENORMOUS RELATED LITERATURE 

SAMPLE QUANTILES: Parzen (2004a,b), Parzen and Gupta (2004), Ma et al. (2011). study 
sample quantile Q(u), mid-quantile Q mid (u), informative quantile QIQ(tt) = (Q mid (u)— MQ)/2IQR. 

NONPARAMETRIC ORTHOGONAL SERIES ESTIMATORS COPULA DENSITY: Compre- 
hensively studied by Kallenberg (2009); pioneering theory by Rodel (1987). 

NONPARAMETRIC ORTHOGONAL UNIVARIATE DENSITY ESTIMATORS: Comprehen- 
sively studied by Provost and Jiang (2012). 

RELATIVE DENSITY ESTIMATION: Popularized by Handcock and Morris (1999). 

ASYMPTOTIC THEORY MAXENT EXPONENTIAL DENSITY ESTIMATORS: Barron and 
Sheu (1991). 

GOODNESS OF FIT DATA DRIVEN TESTS: Ledwina (1994), Rayner et al. (2009), Thas 
(2010). 
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12 GEYSER DATA ANALYSIS 

Geyser data is our role model for understanding the canonical (X, Y) problem. Here we will 
present some result which aims to prescribe a systematic and comprehensive approach for un- 
derstanding (X, Y) data. Terence Speed in IMS Bulletin 15, March 2012 issue asked whether the 
dependence between Eruption duration and Waiting time is linear. Our framework allows us to 
give a complete picture, encompassing marginal to joint behavior of Eruption and Waiting time. 
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Figure 2: Eruption Duration. 
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WAITING TIME QIQ Plot of WAITING TIME 
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Figure 3: Waiting Time. 



20 



o 

CD O) 



H — > 

cd 



o 



o 




i 1 1 1 1 1 1 r 

1.5 2.5 3.5 4.5 

Eruption Duration 



CD 


CD 




im 






c 


O 




H — > 






CO 








O 






m 





0.0 




0.4 0.8 
Fmid(X) 



if 
E 



00 

o 



CD 



O 
CD 



O 



o 



o°°c§ 



o o 



so o 



c© 



oo 



Oo ° 
o oo°° 

OQ° O ( 

o 00 c©° o d 
so 

o 



oo 



8o 



oo 



8° 8° { 



oo ° So 8^°°8 °f 
oo°o8P° 8° 08 8° 

QD 99 ° <8 ^QO^OOOO 
~ ~ ^OO ^00 O OQ 

°o8 ° ° o ^ 

^O ° Q ° °0© 

© ©o® e 



O 



,<§> 



0.0 



T 



0.2 



0.4 0.6 
Fmid(X) 



0.8 



1.0 



Figure 4: Three Scatter Plot. 
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Score Functions of Eruption Duration 
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Figure 5: Eruption Duration. 
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Score Functions of Waiting Time 
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Figure 6: Waiting Time. 
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AIC Selection How Many Basis Function to Choose ? 
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(b) AIC (c) Function of Number of Basis Function 

Figure 7: (a) LP moments of Eruption and Waiting time; (b) Data Adaptive thresholding using 
AIC; (c) value of LPINFOR as a function of number of basis function. LP-Comoment based 
measure is .69 and p = .9. Look at the scatter plot, large number of points accumulate near 
bottom left and top right corner which artificially inflates the Pearson correlation measure, where 
as our method captures the right degree of correlation as a form of tail-dependence; evident from 
LP Comoment matrix. 
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Rank Ordered Absolute values of BETA 
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Figure 8: Density estimation via Comparison density. 
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Eruption vs. Wating Time 



Q(u; Eruption) vs. Wating Time 




Figure 10: Regression. 
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Copula Density of (Eruption, Waiting ) 




Figure 11: Shape of the estimated (L 2 ) Nonparametric Copula density based on AIC selected 
product basis functions Sj(X)Sk(Y), j, k = 0,1,..., 4, where Sq(X) = S (Y) = 1. It gives a 
complete and remarkably accurate picture of the (tail) dependency. Compare the scatter plot of 
3(c). 
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